56 research outputs found
Recommended from our members
WordsEye: An Automatic Text-to-Scene Conversion System
Natural language is an easy and effective medium for describing
visual ideas and mental images. Thus, we foresee the emergence of
language-based 3D scene generation systems to let ordinary users
quickly create 3D scenes without having to learn special software,
acquire artistic skills, or even touch a desktop window-oriented
interface. WordsEye is such a system for automatically converting
text into representative 3D scenes. WordsEye relies on a large
database of 3D models and poses to depict entities and actions. Every
3D model can have associated shape displacements, spatial tags,
and functional properties to be used in the depiction process. We
describe the linguistic analysis and depiction techniques used by
WordsEye along with some general strategies by which more abstract
concepts are made depictable
Recommended from our members
Painting Pictures with Words - From Theory to System
A picture paints a thousand words, or so we are told. But how many words does it take to paint a picture? And how can words create pictures in the first place? In this thesis we examine a new theory of linguistic meaning -- where the meaning of words and sentences is determined by the scenes they evoke. We describe how descriptive text is parsed and semantically interpreted and how the semantic interpretation is then depicted as a rendered 3D scene. In doing so, we describe WordsEye, our text-to-scene system, and touch upon many fascinating issues of lexical semantics, knowledge representation, and what we call "graphical semantics." We introduce the notion of vignettes as a way to bridge between function and form, between the semantics of language and the grounded semantics of 3D scenes. And we describe how VigNet, our lexical semantic and graphical knowledge base, mediates the whole process.
In the second part of this thesis, we describe four different ways WordsEye has been tested. We first discuss an evaluation of the system in an educational environment where WordsEye was shown to significantly improve literacy skills for sixth grade students versus a control group. We then compare WordsEye with Google Image Search on "realistic" and "imaginative" sentences in order to evaluate its performance on a sentence-by-sentence level and test its potential as a way to augment existing image search tools. Thirdly, we describe what we have learned in testing WordsEye as an online 3D authoring system where it has attracted 20,000 real-world users who have performed almost one million scene depictions. Finally, we describe tests of WordsEye as an elicitation tool for field linguists studying endangered languages. We then sum up by presenting a roadmap for enhancing the capabilities of the system and identifying key
opportunities and issues to be addressed
Recommended from our members
Spatial Relations in Text-to-Scene Conversion
Spatial relations play an important role in our understanding of language. In particular, they are a crucial component in descriptions of scenes in the world. WordsEye (www.wordseye.com) is a system for automatically converting natural language text into 3D scenes representing the meaning of that text. Natural language offers an interface to scene generation that is intuitive and immediately approachable by anyone, without any special skill or training. WordsEye has been used by several thousand users on the web to create approximately 15,000 fully rendered scenes. We describe how the system incorporates geometric and semantic knowledge about objects and their parts and the spatial relations that hold among these in order to depict spatial relations in 3D scenes
Recommended from our members
VigNet: Grounding Language in Graphics using Frame Semantics
This paper introduces Vignette Semantics, a lexical semantic theory based on Frame Semantics that represents conceptual and graphical relations. We also describe a lexical resource that implements this theory, VigNet, and its application in text-to-scene generation
Recommended from our members
Collecting Semantic Data by Mechanical Turk for the Lexical Knowledge Resource of a Text-to-Picture Generating System
WordsEye is a system for automatically converting natural language text into 3D scenes representing the meaning of that text. At the core of WordsEye is the Scenario-Based Lexical Knowledge Resource (SBLR), a unified knowledge base and representational system for expressing lexical and real-world knowledge needed to depict scenes from text. To enrich a portion of the SBLR, we need to fill out some contextual information about its objects, including information about their typical parts, typical locations and typical objects located near them. This paper explores our proposed methodology to achieve this goal. First we try to collect some semantic information by using Amazon’s Mechanical Turk (AMT). Then, we manually filter and classify the collected data and finally, we compare the manual results with the output of some automatic filtration techniques which use several WordNet similarity and corpus association measures
Recommended from our members
Data Collection and Normalization for Building the Scenario-Based Lexical Knowledge Resource of a Text-to-Scene Conversion System
WordsEye is a system for converting from English text into three-dimensional graphical scenes that represent that text. It works by performing syntactic and semantic analyses on the input text, producing a description of the arrangement of objects in a scene. At the core of WordsEye is the Scenario-Based Lexical Knowledge Resource (SBLR), a unified knowledge base and representational system for expressing lexical
and real-world knowledge needed to depict scenes from text. This paper explores information collection methods for building the SBLR, using Amazon’s Mechanical Turk (AMT) and manual normalization of raw AMT data. The paper follows with manual review of existing relations in the SBLR and classification of the AMT data into existing and new semantic relations. Since manual annotation is a time-consuming and expensive approach, we also explored the use of automatic normalization of AMT data through log-odds and log-likelihood ratios extracted from the English Gigaword corpus, as well as through WordNet similarity measures
Recommended from our members
Evaluating a Text-to-Scene Generation System as an Aid to Literacy
We discuss classroom experiments using WordsEye, a system for automatically generating 3D scenes from English textual descriptions. Input is syntactically and semantically processed to identify a set of graphical objects and constraints which are then rendered as a 3D scene. We describe experiments with the system in a summer literacy enrichment program conducted at the Harlem Educational Activities Fund with 6th grade students, in which students using the system had significantly greater improvement in their literary character and story descriptions in pre- and post- test essays compared with a control. Students reported that using the system helped them imagine the events in the stories they were reading better. We also observed that social interaction engendered by this process was a strong motivator
Recommended from our members
Frame Semantics in Text-to-Scene Generation
3D graphics scenes are difficult to create, requiring users to learn and utilize a series of complex menus, dialog boxes, and often tedious direct manipulation techniques. By giving up some amount of control afforded by such interfaces we have found that users can use natural language to quickly and easily create a wide variety of 3D scenes. Natural language offers an interface that is intuitive and immediately accessible by anyone, without requiring any special skill or training. The WordsEye system (http://www.wordseye.com) has been used by several thousand users on the web to create over 10,000 scenes. The system relies on a large database of 3D models and poses to depict entities and actions. We describe how the current version of the system incorporates the type of lexical and real-world knowledge needed to depict scenes from language
- …